-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send/receive layers to reduce buffer transfer time #49
Conversation
Tagging @CatherineThomas-NOAA for awareness. These changes should not alter output from the analcalc job but I have not run any tests to confirm this is true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes seem OK.
Ran 20211221 06Z gdasanalcalc from g-w CI C96C48_hybatmDA twice. The first run used gsi_utils at 9382fd0. The second run using gsi_utils from DavidHuber-NOAA:fix/send_recv
.
The analysis files are bitwise identical between the two runs.
interp_inc.x
from this PR ran a bit slower than the original interp_inc.x
original interp_inc.x
The total amount of wall time = 0.614682
The total amount of wall time = 0.613466
The total amount of wall time = 0.612958
updated interp_inc.x
The total amount of wall time = 1.035355
The total amount of wall time = 1.906664
The total amount of wall time = 1.134823
I'm not sure if these timings are significant. The analysis resolution is very low. The tests were run on Hercules using the /work/noaa/stmp
fileset. Use of this fileset is known to produce wall time variability.
@CatherineThomas-NOAA do you have a COMROOT from a GFS v17 experiment run at operational resolution (C768 deterministic, C384 ensemble). If not this, do we have cold start initial conditions for GFS v17 at this resolution? I ask because I would like to see how the revised |
@RussTreadon-NOAA I have some ICs available in |
I also have recent experiments on Hera: COM: |
Cactus test Run gdas jobs twice for 20211221 00Z. First run used
original
The wall times above correspond to the following executables
This PR only changes Given this rerun the 20211221 00Z gdas cycle again using
The rerun is faster than the original run but still 65 seconds or 23% slower than the control. The While the above is not conclusive, it suggests that the changes in this PR increase the Tagging @CatherineThomas-NOAA for awareness. |
@RussTreadon-NOAA @aerorahul @CatherineThomas-NOAA I performed a series of tests on WCOSS2 dogwood at C96-C768 resolution, running a total of 5 cycles at each resolution and comparing the runtimes from both develop and the fix/send_recv branches. Below is a chart showing the mean runtimes from the 15 interp_inc.x executions at each resolution and the differences. And here are the compared runtimes for the Attached is the spreadsheet where these were calculated. So overall, this change appears to result in a slight increase in runtimes, most noticeable at C768, with a mean increase in runtime of 11s (~3.35%) in the job. |
@DavidHuber-NOAA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good.
As documented in the PR:
- changes are reproducible
- increase in runtime of the job is under 4%
Thank you @DavidHuber-NOAA and @RussTreadon-NOAA for your rigorous testing on this. The results look reasonable to me. |
Merging based on approval comments from @CatherineThomas-NOAA |
* origin/develop: Send/receive layers to reduce buffer transfer time (NOAA-EMC#49)
This reduces the amount of data sent/received when collecting data to be written to the output interpolated increment netCDF file.
This reduces the amount of data sent/received when collecting data to be written to the output interpolated increment netCDF file.